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1 Introduction 

The method of types plays very important and central role in the classical infor- 
mation theory. With it the central theorems can be easily and fast proved. This 
work try to generalise the main ideas of the method of types, and prove one of 
the central theorems - Reaching the Holevo capacity - in quantum environment. 

The main problem is: Suppose that we want to send information with quan- 
tum's. This problem is relevant, because 1) quantum computers would prefer 
this way of communication 2) The miniaturisation in the Information Technolo- 
gies can lead to these type of problem. The problem can be formalised as follows: 
We code our classical message to quantum sequences states from (tJi,u>2, . ■ - ,u>i). 
We suppose that there is a unique non-reversible quantum transformation E (•) 
- quantum channel - which acts on every of these quantum's. The question is 
how many bits of information can be transmitted by one quantum. A theorem 
stated by Gordon and Levitin, proved by Holevo [4], gives an upper bound to 
the amount of information that can be communicated. If the sender codes his 
information to quantum states with density matrix pi with a priori probabilities 
pi then the communicated information cannot be bigger than 



i=l i=l 

where S(-) is the von Neumann entropy. If the outcome of the channel is pi — 
£{u>l) this gives an upper bound. So the problem is to show, that this bound 
can be reached. 

At the end, our result is stronger than the work of Holevo [5] or Schumacher 
and Westmoreland [3], because we will show that the decoding can be done 
by von Neumann measurement, not only with POVM (We doesn't use "Pretty 
Good Measurement" as in [2] or [5]). Moreover, we show two use of the von 
Neumann measurement. 

The first use is that the procedure can be generalised to finite compound 
channel, that means, we can create an optimal coding scheme to work not only 
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with one quantum channel, but with finite many. Definition and capacity is in 
section [4] 

The other use is that we can translate decoding of classical information to 
decoding of classical quantum information, and with the von Neumann mea- 
surement the time of the measuring procedure can be extremely shorten, which 
means that classical information can be decoded by a quantum apparatus in 
linear! time (This is a strong result, the best codes which reach the Shannon's 
bound needs n log2 ' n ' time to decode). 

This work base notation is borrowed from the work of Schumacher and West- 
moreland [3], but the base ideas of the proofs, comes from the classical theory 
e.g. [6]. 

2 Notations and basic lemmas 

Let £(■) be a given quantum channel. Assume that uji,u>2, ■ ■ ■ ,Wj = oj[ are 
input density matrices, with same dimension d and P — {p\,p%, ■ ■ ■ ,p n ) is a 
probability distribution such that they maximise the Holevo quantity 

i 

X (£, PA) = S(£(oj)) - £>S(£(^)) > ( 2 ) 

i=l 

where ui = Xa=i P« a "'v Denote the possible outputs of the quantum channel by 
Pi = £(uji),p = £(u>), these are represented by d x d density matrices. 

Fix n, the length of the (quantum) codewords. We generate randomly M = 
2 nR piece codewords of length n with probability distribution P. We denote 
these randomly generated sequence by oti, 1 < i < M. If a statement is true 
all of the index 1 < i < M, then we will say that it is true for a. The j-th 
symbol of a will be denoted by a(J). 

For all sequence we can define a quantum sequence as follows: 

Pa = Pa(l) ® Pa{2) ® ' ' ' ® Pa(n) (3) 

We will denote by S(p\a) the quantity X^=i PjS(Pj)> because nS(p\a) is the 
expected value of the Neumann entropy of the quantum sequence if we know 
which randomly generated sequence (a) was sent (so we know which basis to 
use), while S(p) is the von Neumann entropy of the sequence if we do not know 
which message was sent. So for fixed P and lj the Holevo capacity becomes 

X = S(p) - S(p\a) (4) 

Which resembles the Shannon capacity 

C = H(Y) - H(Y\X) (5) 

where Y is the output random variable and X is an input random random 
variable of the channel. 
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For a fixed e > 0, a sequence a is called e-typical with respect to P if 

2 -n(H(P)+e) < p«(q,) < 2-™( ff ( p )- £ ) ; (6) 

where H(P) is the Shannon entropy of P. We know from the law of large 
numbers that, if n is large enough then the probability 

Prob (a is typical) > 1 — e (7) 

(because a was generated by distribution P), see [6]. 

Let the spectral decomposition of E{p a ) = 2fe=i ^a,k\s a ,k)(s a ,k\- Because 
p a is a tensor product, the eigenvectors are tensor products of eigenvectors of the 
Pa(i); Pa(2), ■ ■ ■ ! etc.. So a measurement in the eigenbasis can be represented by 
a sequence, from numbers {1,2,..., d} where the i-th term gives that what we 
would measure if we measure p a (i) in its eigenbasis. Denote this correspondence 
by s : {1, 2, . . . d n } -> {1, 2, ... d} n , note that A fe = A s(jt)l X s(k)2 ■ ■ ■ X s (k)„ ■ An 
eigenvector |s Qj fe) is ^-typical if the above defined distribution (X s (k) 1 X s (k) 2 ' ' ' ) 
is conditionally typical to the sequence a (see [6]) 

- n(S(p\a) + 6)< log X a , k < -n(S(p\a) - S) (8) 

Note that all exponent and logarithm are base of 2 across of this article. The 
above definition means that if we define the typical projection as 

Pia = \s a ,k){s a ,k\ (9) 

fc:s Q) fcis typical 



, then 
while 



dim{Il a ) < 2 nSip ^ +s (10) 



i = i> a , fc > E E 2-^i«>+ 5 > (ii) 

k—1 fc:s Q; fcis typical fc:s Q; fcis typical 

2 n(S(p\a)+S) > ^ 1 = dim(n a ) (12) 

/c:s Qi fcis typical 

The pair (a, k) is 5 typical if 

-n(H(P) + S(p\a)+S S j < \ogP n (a)X^ k < -n(H(P) + S(p\a) - «j) . (13) 

Let the distribution of (a,s(fc)) be denoted by 

P a ,s(k) =nf =1 P(a(i))A a(i) , s()fe)i , (14) 

it can be seen that this is a probability of independent, identically distributed 
random variables. The (Shannon) entropy of this distribution is 

i 

H(P) + Y J P l S{ Pl )- (15) 

i=l 
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So for these pair of random variables, the law of large numbers also true, so then 
by summing the probability of all typical pair we also get a greater number than 
1 — e if n is large enough. Suppose that the indexing of the eigenvalues is such 
that the typical eigenvalues are the first d(a) indices. Then 

d{ot) 

^ A Q ,, > 1 - 2e (16) 

a i—1 

because from the sum we only left the atypical a (which probability is smaller 
that e) and atypical a, k pairs (which probability is also smaller that e) From this 
we know, that for p — ^2 a PaY^i=i ^a,i\s a ,i)(s at i\ it is true that Trp > 1 — 2e. 
Define p a as 

p a = n Q p Q n Q (17) 



Remark 1 p < p® n and p a < p a , and E(p a ) = p 
Lemma 1 For every n £ N 

n Pa \ = P ® n (is) 

Proof:Total induction on n For n — 1 the equivalence is true by the definition 
of p. Suppose that for n = k — 1 the statement is true, then For n = k let denote 
a' an arbitrary k — 1 length sequence then for every a k length sequence can be 
written as a — (a',«), 1 < i < I. Then 

a a' i=l 

I 

= Y.p^^^p^' pi = & fe ~VK' ® p (20) 

a' i—1 a. 1 

but we know that for n = k — 1 the statement is true, so 

^[Pa} = P^ 1 ®P = P m (21) 

So the statement is true for all n e N. ■ 

If we have a projection, like II a then we can define a subspace which this 
projection projects to ir a = Im(H a ). And vice versa, if we define a subspace n a , 
then we can define an orthogonal projection which project to this subspace n Q . 
This will be done throughout the paper by denoting with the same letter, indices 
the lowercase denotes the subspace the uppercase denotes the projections. 

Consider the lattice of the projections. For 2 projection Pi and P2 denote 
Pi V P2 the projection which is the result of the V operation in the net of 
projections (this means that Pi V P2 is the projection which project to the 
subspace spanned by the range of Pi and P2). Similarly meaning has Pi A P2 
(Pi A?2 s the projection which projects to the subspace which is the section of 
the range of Pi and P2). 
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Lemma 2 For every density matrices p, a and for every e > there exist a 
projection IT, with properties 

Tr(IIp® n II) >1 - e (22) 

||np® n n|| <2-"( s w- £ ) (23) 

Tr(II(T 0n n) <2-"(-°(pH- £ ) (24) 
if n > N(p,cr,e), where D(p\a) denotes the quantum relative entropy of p,a. 
for proof see the appendix. 

Remark 2 Note that, II does not depend on 014 the randomly chosen sequence, 
but Ili— does ( across the article if an amount depend on the randomly generated 
sequence, then it will denoted by an a.^ in the index or in the argument J. This 
means that E(Up az U) = Tlp® n Tl while E(p a J = p® n , but E(IU- p ai ILi-) ^ 



3 Reaching the Holevo bound with von Neumann 
measurement 

3.1 Coding/Decoding 

First we generate M = 2 nR random codewords with distribution P. These 
codewords are denoted by a\, a%, . . . , Om and both the sender, and receiver are 
familiar with them. From these we generate quantum codewords. 

A quantum codeword is a tensor product density defined in the following 
way: If ati(j) = k, this is the jth symbol of the ith codeword, then the jth 
density of the zth tensor product is ojk- The coding is as usual, we choose 
uniformly from the message set M. whose size is M - suppose this chosen mes- 
sage is i -, and we send (or generate) the above defined quantum codeword for 
this message w Q4 m,o; a4 (2)> ■ ■ ■ > w Qi(n)- The quantum sequences go through the 
channel, the receiver gets the quantum sequence p Qi (i),p Qi (2), ■ ■ ■ >Pat(n) where 
Pai(i) = 

Now we define a decoding algorithm which is nothing else than a POVM 
(von Neumann measurement) . If the typical subspaces of p ai were orthogonal 
to each other, then - there would be no problem - we could make our POVM 
from II Qi (where II Qi defined in ©). However, this is not the case in general, 
and we have to orthogonalize them. We do this by a method, very similar to 
the Gram-Schmidt orthogonalisation method. In the first typical subspace let 

7ri = span{\s auk ) : \s ai . k ) = n|s ai> k), \s ai>k ) 6 7r ai } (25) 

where span{} means the subspace spanned by the the vectors in the curly 
bracket. 
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And for the z-th typical projections let 

f \ 

Hi = span < \s au k) : \s ai ,k) = n|s Qi , fc ) - ^ fljII|s oljfc ), \s auk ) € 7r Qi \ (26) 

(for defining subspaces to projections, and vice versa see Section [2]) 

At the end of the procedure for all s,t IT is orthogonal to IT* s ^ t. We 
prove this, by total induction on s, t. Suppose that s < t. We see that for 
s = 1, i = 2 this is true because 

fii|g aa ) = fiiiiio - n 1 n 1 n|o = fiiU\ Sa2 ) - fiiii\s a2 ) = o (27) 

so 7T2 € Ker(Ui). Suppose that, for all pairs s',t', where s' < s,t' < t the 
statement is true, then 

t-i 

fi.\8 at ) = n s n| Sat ) -u s J2 fif n|s aa ) = n s n|s Qt ) - n s n|s Qt ) = (28) 

c=i 

because n s II t / = for t' ^ shy the induction assumption. Moreover the 11^ and 
II are commutable operators (because every vector which spans the subspace of 
IT j projects onto is a member of 7r). 

Our POVM (or our Von-Neumann measure) states from 11^, plus we can 
make it complete adding an element of the POVM (labelled "error") on the 
remaining orthogonal subspace, if necessary, call these projection as Hm+i- For 
index i, we can define the typical subspace of lesser indices as 

i-l 

3.2 The error probability of Decoding 

Now we show that the error probability is going to if n the block length goes 
to infinity. 

Theorem 1 With these scheme, if R < x{p) then for any 7 we can give a 
number N such that if the length of the quantum codeword n is longer than this 
number n > N then average error probability is smaller than 7, provided that 
the blocklength n is greater than tiq{R, 7) 

To simplify the proof we use the following lemma. 

Lemma 3 The length of the projection of |s aijJ -) to II, (element of our mea- 
suring POVM) can be underestimated - by two component where one is the 
projection to II, and the other is orthogonal to the typical subspace of the lesser 
indices (fij-J - as follows 

Tr(n l/9Q ,n 4 ) > (Tx(iL5 Q< n) - Tr^.n^nfli.!)) (29) 

(For definition of p ai see (J?)) ) 
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Proof: 

d 

Tr{thp a Ai) = Tr(n 4 p Ql ) = Tr(n, ^ \ ai ,j\s ai .j) (s ai ,j\) = (30) 

3=1 

Now we decompose the projection H4 depend on j as follows 

n 4 = a]\~s a ^ ] ){s a ^ ] \ + ri 4 (31) 

where aj > 1 is the reciprocal of the length of \s ai ,j) = n|s Qii fc)— X)j=i njII|s Qi ,fe) 
(length of |s Qi j) is smaller than 1 because it is an orthogonal component of a 
unit length vector \s ai ,j))- So we continue the first row by 

d 

= ^^J^i^^a.^iSmjWSa.^iSa^jl + fij | S Qj)j ) {s ai j |) = (32) 
3=1 

d d 

3=1 3 = 1 

d(«i) 

^ X] A 3 Tr (l 5 ° !j ,i)(Sa i ,j||Sa»,i)<S a!j ,il) = (34) 
3=1 
d(a<) 

= E ^Tr((n| SQjJ ) -rVn| SQ ,;))(vW|n- (s^inrV)!^)^!) = 

3=1 

(35) 

= E ^ « a <wl( n Kj> - n^ni^))) 2 (36) 

3=1 

Where ([34)1 comes from the fact, that a,j > 1 so a ■ > 1. From the Jensen's 
inequality 

/<*(«*) \ 2 

' I E A - - ' • ills " • "< 11 • ' j = (37) 

/d(a<) d( a< ) \ 2 

= E A 3(^,3|n|s QlJ ) - E ^(«Oij|fii- n l*aij) = (38) 
\ 3=1 3=1 / 

«i) d(ai) \ 2 

E A ^ Tr(H| SQi ^(s^J - £ A, Tr^nK^^I) = (39) 

3=1 3 = 1 / 

(TT(Tlp cu ) - Tr(fli_Ilp ai )y > (Tr(np a< n) - T^n^IIp^niV)) 2 (40) 
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Now see the proof: 

Proof of Theorem Q} Let e is such small that R + e < x(p): an d 8e < 7 and 
let n such large that all the Lemmas and definitions from Section [2] with e are 
true. 

1 M 

P e = E[l-—J2^iPaA)}< (41) 



M 

i=l 



^ Ti E^-W^ni)] < (42) 

i=l 
1 M 

^-^Ell-^iii)] (43) 



M 

i=l 



We use the Lemma 

w 

M 



Pe ^ jj J2 1 ^ E [^( n Pa 4 n - rVn^nrV)] 2 (44) 

i=l 

We use the Jensen's inequality 

M 

^^^i- [ETr(rL5 ai n - iVnp ai nrV)] 2 = (45) 
i=i 

i M 

= J2 1 - [ Tr ( nE (^) n - n^nE^jPiLV)] 2 = (46) 

i=l 

1 M 

— £)i - [Tr(npn) - E(Tr(n J _n / 5 Q! nn J _))] 2 (47) 

i=l 

Prom (J6|), and remark [TJ we know that Tr(/5) > 1 — 2e and p > p so, for 
A = p-p, Tr(A) < 2e. With this 

Tr(IL5n) = Tr(npn) - Tr(fl(p - p)fl) > 1 - e - Tr(A) (48) 

So the error probability 



Pe - jj J2 1 " t 1 " 3e _ E(Tr(rVn,5 ai niV))] 2 < (49) 

1=1 

1 M 

< 51 1 - [1 - 3e - E(Tr(fVn,5 Qi nLV))] 2 (50) 



8 



Now we analyse the last term 

dim(tli— ) 

E(Tr(n,_n j 5 Qi nn,_)) = E(Tr(n J _n / 5 Qi n)) = e(tv( J2 i^K^n^n)) = 

3 = 1 

(51) 

dim(Yli- ) 

E( (&il n P«* n l*i» 

i=l 

(52) 



Now we have to evaluate etem(IIi_). Because of ifTO]) 

M 

dim(fli-) <^dim{Ii aj ) < M2 nS{p ^ = 2 n( - R + s W a » (53) 

J=2 



E(Tr(f[ i _np 0!i nfl i _)) < 2 n{R+SMa)) maxE((6 i |n / 5 Qi n|6 i )) = (54) 

3 

2 n(R+S(p\a)) max^inpnifc.) < 2 "(- R + s (' ) l«)) max (& \UpU\bj) (55) 
Because bj's are unit vectors, from 11231) we know that max j(bj\TlpTl\bj) < 

E(Tr(n l _np Qi nn. i _)) < 2 n ^ R + s( - p ^h- n{s{p) - E) < 2- n(s ^- s(p ^- R -^ (56) 

Now we can see that S(p) — S(p\a) — x{p)i an d we assumed that R+ s < x{p)i 
so the exponent is negative. If n is large enough then the whole expression is 
less than e. 

E(Tr(n s _np Qi nn^)) < e (57) 

So the error probability is smaller than 

P e < 1 - [1 - 4e] 2 = 8e - 16e 2 < 8e (58) 



4 I. Use: 

Coding for finite compound channel 

First we give a definition of the compound channel. Suppose there is a given set 
of channels S. We want a predefined coding scheme to code our message with the 
following disturbing effect: suppose there is an enemy who chose one channel 
from the set after we generated our quantum codeword. Now our quantum 
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codeword are send through the chosen channel (so all the quantum has the 
same effect but we cannot say which). 

This is a more realistic model than a simple quantum channel, we know what 
effects can destroy our quantum codewords, but we cannot know at the present 
moment, which effect is active. This compound channel is a finite compound 
channel, if the set S is finite. Define: 

x(S,P,w[)=^x(£,PA) (59) 
X (S)=m afX (S,P,Lj{) (60) 



Theorem 2 The classical capacity C of the finite compound channel is x(S)> 
This means, if R < x(<S) then for any 7 there exist a number N(pf,R,\S\) that 
if the length of the quantum codeword n is larger than that number n > AT (7) 
then the error probability is smaller than 7. 

Proof:First we can assume that |«S| > 2, because for |5| = 1, the proof is 
the same as theorem [TJ The inequality C < x(S) is simple. Because in [4j 
was shown, that a quantum system can carry x(£>P, w[) bit information if the 
sender code his message to quantum states £(u>i) with a priori probabilities P. 
If the sender has no knowledge, which channel is being used, then the sender 
can only codes his message to states u>i with a priori distribution P. Then if his 
enemy chose the worst channel for these schema, the carried information cannot 
be larger than 

maxminx(£ ,P,u)[) = x(S). 

To prove that C > x($) we show a coding scheme which rate can reach the 
capacity. 

4.1 Coding for finite compound channel 

Let |<S| = a, S = £1,82, ■ ■ ■ , £a fixed. Let P, and wi,u>2, . . . , cj; be the probability 
distribution and quantum's that reach the maximum in lj60j) . The sender codes 
his message to randomly generated codeword as in section[3l The codeword goes 
into the quantum channel £ o £ {1, 2, . . . , a} which was chosen by our enemy. 
Denote u = X^=iPi w ' the input mixed state, and denote p k = £®"(w®"), 1 < 
k < a the possible mixed output of the channel. Similarly denote p\. — £k(p ai ) 
the possible output of the i-th quantum codeword. To simplify our proof we 
can assume that, the order of the set of quantum channels is such that, the first 
a p k , k 6 {1, 2, . . . , a} is different. 

4.2 Decoding for finite compound channel 

Decoding is done by two steps: In first step we can detect which mixed state 
we have. In the second step we detect the message. 
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See the first step. Let denote a the number of the different output mixed 
states. To distinguish the output mixed states we will use our Lemma [2 Let e 
be such small that e < minj^i<i i j< a D ( p ^ p 1 ; e < ^ and R + e < %(<S). Then 
for an n large enough for every i,j : i ^ jl < i,j < a Lemma [2] is true, with 
p = p l , (7 = p k and get IP J (if n is greater than m&x(N(p l , p k , e))). Now we 
make a typical projection for all i as follows: 

IT = f\ U z ' j (61) 
i<j<a 

See, that 

Tr(nV)®"lT) =1 - TT{{W) c ( Pl f n (Il l ) c ) > (62) 
>1- Tr((n lJ ) c ( j0l )^"(n lJ ) c ) > l-ae (63) 

||ir(p i )® n n < || <2-™( s («)- £ ) (64) 

Tr(IfV)®"ir) <2- Tl( - D ( pl H pJ )- £ ) (65) 
for all 1 < i,j < a. 

We detect the mixed state as follows: First we give a sequence of measures. 
Our first POVM measure states from II 1 ,(n 1 ) c if we measure II 1 we know, 
that our mixed state was p 1 so we stop, if we measured (II 1 ) c then we measure 
again. Second POVM measure states from II 2 ,(II 2 ) C , etc. With this we can 
differentiate the possible a type of our mixed state. 

We suppose that our enemy chosen the channel £ which generate the fc-th 
mixed state (this means £i(uj) = £ {lo)), if k ^ 1 then the first measure is good, 
if we measure the second projection. This probability is 

Tran^^n (66) 

and our quantum states will be 

(n^p^oi 1 ) 



Tr((ni)^(ni) C ) 



(67) 



Next if k ^ 2 then the next measure is good if we measure the second projection 
this probability is 

T'((n ! )'«^(n>n (68) 

and our state becomes 

(n 1 ) c p fc (n 1 ) c 

( n2 ) C Tr((ni)-p| ! (ni)-)( n2 ) c (n 2 ) c (n 1 ) c p^(n 1 ) c (n 2 ) c 



Tr((n 2 )e (nl)c ^. (nl)c mm Tr(n 2 ) C (ni)^(m) C (n 2 ) C ) 



(69) 
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And the probability, that we don't made error through the first, and the second 
step is 

Tr I rrwmn- fc 1 Tr (( n ) Pa,(n ) ) = (70) 



Tr^n 2 )^ 1 )^ (n 1 )^!! 2 ) ) (71) 



From this we can see that, if £ (w) = £fc(w) then the probability that we detect 
our mixed state correctly is: 

E[Tr(n fe (n fc - 1 ) c . . . (n 2 ) c (n 1 ) < >* < (n 1 ) c (n 2 ) c . . . (n* ; - 1 ) c n fe )] (72) 

Let 5 = as. From Lemma [TJ and from the definition of II fe we know 

E[Tr(nV« ! n fe )] = Tr(nVn fe ) >i-s (73) 

Moreover 

1 - 6 < Tr(n fe p fe n fc ) < Tr(tt k (n 1 y P k (n 1 yn k ) + Tr(n fe nV fe n 1 n fe ) < (74) 

< Tr{n k (ii 1 yp k {ii 1 yn k ) + T^nVn 1 ) (75) 

< Tr{n k (U 1 y P k {Il 1 yil k ) + 2- nD ^^ k ^- e = (76) 

= Tr(n fe (n 2 ) c (n 1 ) c / 9 fe (n 1 ) c (n 2 ) c n fe )+ 

+ Tr^n 2 ^ 1 )^^ 1 )^ 2 ^) + 2- nSi - pl \ pk ^- £ < (77) 

< Tiin k (n 2 y{ii 1 y P k (Ti 1 y{Ti 2 yii k ) + Tr(nVn 2 ) + 2- nD{pl ^ pk) - £ < 

(78) 

< Tr(n fc (n 2 ) c (nY/XnY(n 2 ) c n fc ) + 2- nD{p2 w pk ^- e + 2- nD{pl w pk ^- £ < 

(79) 

<■•• < Tr(n fe (n' £ - 1 ) c ...(n 2 ) c (n 1 ) c p^(n 1 ) c (n 2 ) c ...(n fe - 1 ) c n fc )+ 
fc-i 

+ ^2-"^ ! ll^)- £ (80) 
1=1 

This means that 
fe-i 

I - 5 -Y^^~ nD{plyk) ~ e (81) 

< E[Tr(n fe (n' £ - 1 ) c . . . (n 2 ) c (n 1 ) c p* i (n 1 ) c (n 2 ) c . . . (n*- 1 )^)] (82) 



See, that 

fc— 1 

1 - 5 - a min 2-"^' Ho*)"' < 1 - 5 - V 2- nD ^' (83) 
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Because e < g(p ' 2 lw) this means that if n is enough large then 

a min 2 - nS ^'^ k) - e < 5 (84) 
ie{i,2,...,fc} 

which means that the expectation value of the probability of the good detection 
is greater than 1 — 25. This mean the the error (that we detect a wrong mixed 
state, or all the measure never decide for the first projection) is smaller than 
25. See that this bound is valid for all possible 1 < k < a. Denote by P k the 
following operator n fe (n fe_1 ) c . . . (n 2 ) c (n 1 ) c , with this notation at the end of 

p k p° a p h * 

the procedure, our quantum codeword p ai will be in the form Tr(p fc p° P k ') 

Let see the second step, now we detect the message. Suppose that we 
detected that our mixed state is p k which mixed state can be generated by 
£ kl , £ k2 , . . . , £ kl 1 < kj < a,l < a and we know that our enemy chosen £° so 
some kj = o. Prepare all as in Section [3J Now define 

7T, = span{ (J 7r'} (85) 

where 7r| is the typical subspace J9]) of p a . — £i(ui ai ) And define the typical 
projections for the message as in (|25"ll . (|26l> 



■Ki — span 



P k \s)-Y^iP k \s),\s)£^ (86) 

We made our POVM as in Section from these orthogonal projection, with a 
possible complement with an error labelled subspace. Similarly to Section [3l we 
define n 2 _ = Y?j=i 6j 

For these measurement a similar statement is true as in Lemma [3] 

r Tr{%P k P ° oli P k *fL i ) > (Tr(P k p° ai P k *) - r Tr(fl i -P k p° ai P k *tli-)y (87) 

The proof is exactly the same as Lemma [3l So we can calculate the error 
probability of the message detection (with the good "mixed" state detection): 

P e = MTr(P k p° P k *)— V 1 - Trfrii p r^IIi)] < 

L V Fa ' M ^ v Tr(P k p° ai P k ) 

M E[l-Tr{fl l P k p° az P k *fl l )] 



< 



M 

i=l 



with the previous statement, and with the Jensen's inequality 

I M 2 

fe^^Efl- (Tr(P k p° ai P k *) -Tr(IL i _P k p° ai P k *IL i _yj ] < (89) 



M 

! 

< 



1 M 

-V 

M ^ 

i=l 



1 - (Tr(P k p°P k *) - E[Tr(f[ J _P fe p° a .P k *fli-)] 2 



(90) 
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Because p k p°p k * > i — 25 and Tr(/9° — p°) > 1 — 5 the first term is greater than 
1 — 3(5 as in proof of Theorem [TJ The second term is 

E(Tr(iL i -P k p° oti P k *iL i -)) = E(Tr(ft i _P fe p°.P fe *)) = 

(91) 

dim(Yli- ) dim(Tli- ) 

E(Tr(^ \b j )(b j \P k R ti P k *)) < EC£(bj\n k P° a V%)) < dim(IU-)\\IL k p k Il k \\ 

3=1 3=1 

(92) 

because P k p° a .P k * < U k p° a Jl k , and we assumed that £ (u>) = £k(oj). Now we 
have to evaluate dim(tli-) 

l M 

dim(Ili-) <J2J2 dim ( n aJ ^ aM2 nS ^ = a 2 n ( R+s ^ a » (93) 

j—l s—i+l 

We use (J23J) and get 

E(Tr(ni_n fe p°.n' ! n i _)) < a 2- n i s (p°)- s (p°\^ R -^ (94) 

which is smaller than 8 if n is large enough, so 

1 M 

P < tt^I 1 - (! - 3(5 ~ 5 ) 2 < 8<5 < 8ae (95) 
i=i 

Because e < ^ with this the theorem is proved. ■ 



5 II. Use: Practical considerations 

One could think that, after the articles of Schumacher or Holevo [3], [4] that we 
can communicate classical data through a quantum channel optimally. However 
this is true only in theory, because to measure a POVM with many output 
(the needed output of the POVM grows exponentially in n the length of the 
codeword) is very difficult in practice. But as we will see, this is not the case in 
the von Neumann measurement, we will give a detection algorithm - a sequence 
of measure - where the number of outcomes of the measures are always 2. 
We introduce the following notation for 1 < i < j < M + 1 

D {iij} = fii + fi i+1 + • • • + ftj (96) 

For simplicity, suppose that M + 1 = 2 k . Then the detection algorithm can 
be the following: First we measure a von Neumann measurement states from 
-D{i,(A/+i)/2}7 -D{(M+i)/2+i,Af+i}- I n every measurement, if the result is the first 
operator, we give 0, if the second we give 1. 
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Now we measure again. Of course on a quantum state that is modified 
by the previous measurement. In each next step we half the interval of the 
previous measurement. If our measurement gave the {i,j} our measurement 
will states from £ ) {i.i+(j_i)/2},-D{i+(j-i)/2+ij}- For example the second step 
looks like follows: If we measured then our measurement will states from 
-D{o,...,(M+i)/4}, -D{(M+i)/4+i,...,(M+i)/2}, if the previous measurement had gave 
the result 1 then our measurement will states from : 

-D{(M+i)/2+i : ...,3(M+i)/4},£»{3(M+i)/4+i,...,(M+i)}- At the end the 0-s and 1-s 
give the number of the message in binary form. If we get only 1-s then we 
declare error. 

See that the probability of the good detection not changes. Suppose that 
we send the first message, then the first measure will give the good result with 
probability 

Tr (- D {{l,(M+l)/2}Pai£>{l,(M+l)/2}) (97) 

and will the state will change to 

D{l,(M+l)/2}P ai D {U M+i)/2} , gg s 



Tr(-D{l,(M+l)/2}Pai-D{l,(M+l)/2}) 

The second measurement will be good with probability 

TY^n D{l,(M+l)/2}P ai D{l,(M+l)/2} n , 

ir ^{i,(m+i)/4} 7^777: p; v^{b M +i )/4} (yy) 

But D{i,(m+i)/4} < £>{i,(M+i)/2} so this simplify to 



Tr ( D {l,(M+l)/i}Pai D {l.,(M+l)/i} 

VTr(D {1> ( M+1 )/ 2 }PQii-D{i,(M+i)/2}) 
And the state change to 

D{i.(M+l)/4}P ai D{l,{M+l)/4} 



Tr(-D{l,(Af+l)/4}Pai- D {l,(M+l)/4}) 

So the probability that the first two measurement was true is 
T / n „ n ^ Tr /^ D {UM+1)/i}Pai D {UM+1)/4} 

LT \ D {{l,{M+l)/2}Pai-LJ{l,(M+l)/2}) ^ 



(100) 



(101) 



Tr (£ ) {l,(Af+l)/2}Pai-D{l,(A/+l)/2}) / 

(102) 

r T r (- D {l.(M+l)/4}PQi-D{l,(M+l)/4}) 

(103) 

with keep going this train of thought we can see, that at the end that the 
probability that all of the measurement was good is not else, than 

Tr(A,ip ai A,i) = Tr{UiPaAi) (104) 

which the same as the error probability in Section [31 Of course this procedure 
can be generalised to case when we have finite possible outcomes. 
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This means that it is possible to classically code/decode classical informa- 
tion with quantum apparatus in an optimal way in linear time. This is a quite 
surprising result, because in classical information theory to reach the Shannon 
limit in polynomial time is an unresolved problem (the best result needs n log ( n ) 
time) . Usually the classical information theory is considered as a part of quan- 
tum information theory, which would mean that optimal decoding of classical 
channel in linear time is possible. This means that if a quantum machine can 
perform arbitrarily von Neumann measurement with only two possible outcome, 
then this machine can solve non-polynomial classical problems in linear time. 

Now we will show how can be a classical message through a quantum ap- 
paratus decoded. Suppose that there is a classical setup with a discrete mem- 
oryless channel. There is a given state transition matrix W(j/|a;) (with input 
output alphabet 1,2,...,? 1, 2, . . . , d ) and a given optimal input distribution 
P. Now model the classical system with a quantum one. Let define for each x 
p x = diag(W(-\x)) (where diag(W(-\x)) denotes a diagonal matrix we get from 
the output distribution provided by x in another form X) a g{i d} W(a\x)E aia 
). In these case all the classical and all the quantum information quantities 
are equivalent (x(£) = C(W), H(-) = S(-)). Then we know from Section [3] 
that there exist 2 Rn piece of sequence that with p ai quantum codewords we can 
optimally communicate. Compute the optimal von Neumann measurement as 
in Section [3l Now we use the cti sequence as an input codeword for our classical 
channel, and decode the classical channel as follows: We get the classical signal, 
we coded into quantum sequences, we perform the measurement, after that we 
get the number of the message was sent, so we decoded the message (in linear 
time as in the beginning of these section). 

Denote the output signal of <Xj by y(j). Denote the i-th component of y(j) 
by Ui(j) We get the signal y(j) and code every symbol of it, into a quantum in 
the following way: 

Vi(j) -> E Vi(j),viW ( 105 ) 

which means if we get the first symbol of my output alphabet we code into a 
quantum represented by Ex,i, where Eij denotes the matrix with 1 in the z-th 
row j-th column and elsewhere. Denote these quantum sequence by p- y tj) Now 
see that the error probability of the event that the i-th message was wrongly 
decoded: 

E[l-Tr( M , 0) )] (106) 

We have to take the expectation value because the output sequence y(j) can 
varied. It can be easily proved that E[jti y y)] = p aj . So the average error 
probability is same as in Section [3[ Which" means that classical messages can 
be decoded by quantum apparatus in linear (nR) time. 

Proof:of Ef/Xy/j)] = p a - proof with total induction on n for n = 1 the 
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statement is true by the definition. Suppose it is true for n — 1 the for n 

n 

E K(,)]= E ii^foiMOK (107) 

ye{l,...,d}»i=l 
n-1 

E II W{ Vl \a 3 {i))^ y _ r i ® £ W r (a|a i (n))K,B (108) 

i/e{l,...,d}'*- 1 *=1 ae{l,...,d} 

where denote the first n — 1 symbol of y. But by the definition the last 

quantity J2 a e{i....,d} W(a\aj(n))E a . a is not else than p aj {n) so 

n-1 

E Kb)]= E X{W{ yi \a 3 {i))^ y _ r i® PoiD(n) (109) 
we{i,...,d>"- 1 »=i 

but for n — 1 the statement is true, so 

E[ Mj£(j) ] = Paj (110) 



A proof of Lemma [2] 

The proof based on typical sequences. These definition is a simplified/modified 
version of [6]. 

Definition 1 (Typical sequence) For a given probability distribution P on 
{1, 2, . . . , d} an {1,2,..., d} n sequence is called P-typical with constant S, if 



-N(a\x) - P(a) 
n 



<^ for every a G {1, 2, . . . , d} (111) 



where N(a\x) means the number occurrences of a in sequence x and, in addition 
no a e {1,2, . . . , d} with P(a) = occurs. The set of such sequences will be 
denoted by 27™ { or simply T[p]. 

Remark 3 If a sequence x is P-typical as above, then 

I - E log(P(o)) + E P(a) log(P(«))| < (112) 

a=l a=l v 

if 5 is small enough, because if P(b) is then N(b\x) = so N ^-* > — P(b) = 
so the b-th element of the sum will be 0. If P(a) > then log(P(a)) is finite, so 
max a .p( a j >0 [—log(P(a))] = K is finite, so the above sum is smaller than tt=- 
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Lemma 4 For every distribution P on {1, 2, . . . , d}, and for every (3 > 

P n (Tfr ]s )>l-(3 (113) 

if n is large enough. 

Proof:If X_ = X\,X%, ■ ■ ■ , X n is an i.i.d. random sequence with distribution P 
then the random variable N(a\ X) has the expectation value nP(a) and variance 
nP(a)(l — P(a)) < j. Thus by the Chebishev's inequality 



Pr{ 



N(a\X) - n(Pa)\ > n—= 



for every a G {1,2,..., d}. From this the assertion follow. ■ 
of Lemma [2} with these typical sequences we can make typical subspace as 
follows: 

Let J2i=i ^i\ u i)( u i\ — P be a spectral decomposition of p. Now define P as 
P(a) = \ a a G {1, 2, . . . , d} (In this case H(P) = S(p), and let n be large enough 
to verify fTT3l with (3 — e/2. Now we define II the typical projection of p® n by 
as follows 

IT= \u xl ){u xl \®\u X2 ){u X2 \®---®\u Xn )(u Xn \ (115) 

See that if a sequence Xj differs from x 2 then the minimal projection generated 
by x_\ is orthogonal to the minimal projection generated by x 2 . 

For this projection II the assertions |22|) and lf23|) of Lemma[2]are valid. See 
the first assertion 

Tr(flp®™il) =Tr(flp®") = (116) 
= Tr( Y \u xl ){u Xl \p® \u X2 )(u X2 \p®---® \u Xn ){u Xn \p) = 

(117) 

n n 

= e H( u *i\p\ u *i) = e n A - = ( u8 ) 

n 

= E Y{nx l ) = p n {T^ h )>i-p (no) 

if n is large enough n > N\, and the last row comes from the definition of P 
and the previous Lemma. 

Observe that (|23|) is true, because of Remark [31 See that spectrum of IIpII 
is equal with 

d 

spect(Upfl) = { [] W*> , x e Tfo, } (120) 

a=l 
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TT X N(a\x) _ 2 -„^ =1 - 



W(°l2L) 



log(Aa) 



(121) 



and from Remark [3] we know that for all x G T, 



[Ph 



til 



d 



Jl < 2 



n(S(p) 



KdS 



(122) 



where 



KdS 



is smaller than (3 if n large enough. 



We know from [TJ that, if n is large enough, there is an another projection 
II which satisfy J22]), (|24l) . Now the projection which satisfy all the assertion of 
the Lemma is given by II = LT A II, because 
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Tr(IT>n c ) < Tr(n>n c ) + Tr(IT>n c ) = 2/3 

||npn|| <||Dpn|| < 2- n< - s ^-^ 

Tr(na®"n) <Tr(fio-® n n) < 2- n(D ^ a) -l 3) 
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